Building Support Tools for Russian-Language Information Extraction
نویسندگان
چکیده
There is currently a paucity of publicly available NLP tools to support analysis of Russian-language text. This especially concerns higher-level applications, such as Information Extraction. We present work on tools for information extraction from text in Russian in the domain of on-line news. On the lower level we employ the AOT toolkit for natural language processing, which provides modules for morphological analysis and partial syntactic chunking. Since the outputs of both lower-level modules contain unresolved ambiguity, we synthesize the outputs and pass the result into a pre-existing English-language analysis pipeline. We describe how the information extraction system is adapted for multi-lingual support, including extensions to the ontologies and to the pattern matching mechanism. While this is work in progress, we present an end-to-end pipeline for event extraction from Russian-language news.
منابع مشابه
Methodology for Building Extraction Templates for Russian Language in Knowledge-Based IE Systems
Methodology for Building Extraction Templates for Russian Language in Knowledge-Based IE Systems. Valery Solovyev, Vladimir Ivanov, Rinat Gareev, Sergey Serebryakov, Natalia Vassilieva HP Laboratories HPL-2012-211 event extraction; dictionaries; rules; patterns; meaning-text model; Chomsky grammars In this technical report we describe methodology for building information extraction (IE) rules...
متن کاملIncreasing the Effectiveness of Russian Language Teaching for Special Purposes (to the Problem of Integration of Language Training with Information Technology Courses)
The article is devoted to the problem of increasing the efficiency of language teaching for the special purposes of foreign students in studying Russian at a technical university. Particular attention is paid to the training of foreign students in the skills of working with information using the latest computer technology. The conclusions of the work are based on the analysis of the results of ...
متن کاملCreation of Reusable Components and Language Resources for Named Entity Recognition in Russian
This paper describes the development of the RussIE system in which we experimented with the creation of reusable processing components and language resources for a Russian Information Extraction system. The work was done as part of a multilingual project to adapt existing tools and resources for HLT to new domains and languages. The system was developed within the GATE architecture for language...
متن کاملAdapting the PULS event extraction framework to analyze Russian text
This paper describes a plug-in component to extend the PULS information extraction framework to analyze Russian-language text. PULS is a comprehensive framework for information extraction (IE) that is used for analysis of news in several scenarios from English-language text and is primarily monolingual. Although monolinguality is recognized as a serious limitation, building an IE system for a n...
متن کاملKnowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources
Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Even...
متن کامل